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INFERENCE  FOR  DISTRIBUTIONAL  EFFECTS  USING 
INSTRUMENTAL  QUANTILE  REGRESSION 

VICTOR  CHERNOZHUKOV  AND  CHRISTIAN  HANSEN 


Abstract.  In  this  paper  we  describe  how  quantile  regression  can  be  used  to  evaluate 
the  impact  of  treatment  on  the  entire  distribution  of  outcomes,  when  the  treatment  is 
endogenous  or  selected  in  relation  to  potential  outcomes.  We  describe  an  instrumental 
variable  quantile  regression  process  and  the  set  of  inferences  derived  from  it,  focusing 
on  tests  of  distributional  equality,  non-constant  treatment  effects,  conditional  dominance, 
and  exogeneity.  The  inference,  which  is  subject  to  the  Durbin  problem,  is  handled  via  a 
method  of  score  resampling.  The  approach  is  illustrated  with  a  classical  supply-demand 
and  a  schooling  example.  Results  from  both  models  demonstrate  substantial  treatment 
heterogeneity  and  serve  to  illustrate  the  rich  variety  of  hypotheses  that  can  be  tested  using 
inference  on  the  instrumental  quantile  regression  process. 
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1.  Introduction 

In  this  paper  we  describe  how  quantile  regression  can  be  used  to  evaluate  the  impact 
of  treatment  on  the  entire  distribution  of  outcomes,  when  the  treatment  is  self-selected  or 
selected  in  relation  to  potential  outcomes.  We  introduce  an  instrumental  variable  quantile 
regression  process  and  the  set  of  inferences  derived  from  it,  focusing  on  tests  of  distribu- 
tional equality,  non-constant  treatment  effects,  conditional  dominance,  and  exogeneity.  The 
inference,  which  is  subject  to  the  Durbin  problem,  is  handled  via  a  method  of  score  resam- 
pling. The  paper  describes  these  methods  and  establishes  their  asymptotic  validity.  The 
approach  is  illustrated  with  a  classical  supply-demand  and  a  schooling  example. 

Inference  about  distributional  outcomes  is  crucial  in  a  wide  range  of  economic  analy- 
ses. For  example,  formal  evaluation  of  a  social  program  requires  inference  concerning  the 
nature,  direction,  and  magnitude  of  the  program's  impact  throughout  the  entire  outcome 
distribution,  since  evaluation  involves  integration  of  utility  functions  under  alternative  dis- 
tributions of  outcomes.  For  further  examples  and  previous  literature,  see  Atkinson  (1970), 
Abadie  (2002),  Foster  and  Shorrocks  (1988),  Heckman  and  Smith  (1997),  and  McFadden 
(1989).  Just  as  in  classical  p-sample  theory,  e.g.  Doksum  (1974)  and  Shorack  and  Wellner 
(1986),  this  kind  of  inference  is  based  on  the  empirical  quantile  regression  process,  which 
has  recently  been  explored  by  Koenker  and  Xiao  (2002). 

The  goal  of  this  paper  is  to  offer  an  empirical  instrumental  variable  quantile  regression 
(IV-QR)  process  and  a  set  of  inference  tools  derived  from  it.  Effectively,  instrumentation 
eliminates  the  endogeneity  and  selection  bias  commonly  occurring  in  observational  studies 
and  experiments  with  imperfect  compliance.  Thus,  the  IV-QR  process  allows  us  to  measure 
the  exogenous  treatment  effect  as  in  a  fully  controlled  experiment,  whereas  the  conventional 
QR  process  is  inherently  biased. 

Using  the  instrumental  variable  quantile  regression  process  we  describe  and  derive  the 
properties  of  a  class  of  tests  based  on  it  which  allow  us  to  examine: 

1.  The  hypothesis  of  distributional  equality,  or  whether  the  treatment  has  a  significant 
effect, 

2.  The  hypothesis  of  non-constant  or  varying  treatment  effect,  a  fundamental  hypothesis 
of  causal  analysis,  cf.  Heckman  (1990),  Doksum  (1974),  and  Koenker  and  Xiao  (2002), 

3.  The  hypothesis  of  conditional  stochastic  dominance,  a  fundamental  hypothesis  as  well, 
cf.  Abadie  (2002)  and  McFadden  (1989), 

4.  The  hypothesis  of  exogeneity,  or  whether  the  treatment  variable  is  exogenous,  another 
essential  hypothesis,  e.g.  Hausman  (1978). 


A  difficulty  which  arises  when  implementing  these  tests  is  that  some  of  them  are  subject 
to  the  Durbin  problem.1  That  is,  the  model's  features  or  estimated  nuisance  parameters  in- 
duce parameter  dependent  asymptotics,  endangering  distribution-free  inference.  A  method 
of  score  resampling,  which  bootstraps  the  scores  or  estimated  influence  functions  without 
recomputing  the  estimates',  is  suggested  for  generating  asymptotically  valid  critical  values 
for  these  tests.  A  main  advantage  of  this  method  is  its  computational  simplicity,  which 
enables  fast,  practical  implementation.  The  method  is  of  independent  interest  in  many 
other  settings,  and  its  immediate  applicability  to  other  problems  is  assured  by  the  gen- 
eral conditions  given  in  this  paper.  For  example,  it  can  be  used  for  conventional  quantile 
regression. 

The  use  of  the  approach  is  illustrated  through  two  empirical  examples.  In  the  first,  we 
analyze  the  structure  of  demand  for  fish  within  a  simultaneous  equations  demand  system 
with  random  coefficients,  and  in  the  second,  we  consider  the  effect  of  schooling  on  earn- 
ings. We  obtain  clear  evidence  against  the  exogeneity  and  constant  effect  hypotheses,  while 
accepting  the  hypothesis  of  first  order  stochastic  dominance  in  both  of  these  examples. 

Thus,  our  paper  complements  the  previous  work  which  has  considered  research  in  this 
direction,  including  that  of  Hogg  (1975),  Abadie  (1995)  (2002),  MaCurdy  and  Timmins 
(1998),  McFadden  (1989),  Hong  and  Tamer  (2001),  Abadie,  Angrist,  and  Imbens  (2001), 
Koenker  and  Xiao  (2002),  Chernozhukov  (2002)  and  others. 

This  paper  accompanies  our  previous  paper,  Chernozhukov  and  Hansen  (2001),  that 
focused  on  modeling  and  identification  of  quantile  treatment  effects  in  the  presence  of  en- 
dogeneity.  The  present  paper  goes  further  to  establish  the  sampling  properties  of  the  entire 
instrumental  variable  quantile  regression  process  and  of  the  inference  processes  and  test 
statistics  derived  from  it.  It  also  provides  practical  bootstrap  tools  to  carry  out  the  tests. 

The  remainder  of  the  paper  is  organized  as  follows.  In  the  next  section,  we  briefly  discuss 
the  causal  model  and  its  examples  that  underlie  both  the  estimation  and  the  empirical 
examples  developed  in  this  paper.  Section  3  then  presents  the  IV-QR  process  and  develops 
its  sampling  theory.  Section  4  develops  inference  procedures  for  the  IV-QR  process  and 
presents  practical  inference  for  testing  distributional  hypotheses.  The  use  of  the  methods 
are  illustrated  via  two  empirical  examples  in  Section  5,  and  Section  6  concludes. 

2.  The  Causal  Model 

The  following  model  is  a  simultaneous  equations  model.  To  describe  it  we  will  use  a  con- 
ventional potential  outcomes  framework.2  Potential  or  counterfactual  real-valued  outcomes 
are  indexed  against  treatment  D  (D  €  D,  a  subset  of  ffif ),  and  denoted  Y^  while  potential 

The  term  was  coined  by  Koenker  and  Xiao  (2002)  to  emphasize  Durbin's  contribution  to  theory  of 
goodness-of-fit  tests  with  estimated  parameters  and  have  an  easy  way  to  refer  to  the  problem. 
See  e.g.  Heckman  and  Robb  (1986)  and  Imbens  and  Angrist  (1994)). 


treatment  status  is  indexed  against  the  instrument  Z,  and  denoted  Dz.  For  example,  Yd 
is  an  individual's  outcome  when  D  =  d  and  Dz  is  an  individual's  treatment  status  when 
Z  =  z. 

The  potential  or  counterfactual  outcomes  [Yd,  d  e  D},  such  as  wages  or  demand,  vary 
across  individuals  or  states  of  the  world.  Given  the  actual  treatment  D,  the  observed 
outcome  is 

Y  =  YD. 

That  is,  only  the  D-th.  component  of  {Yd,d  e  T>}  is  observed.  Typically  D  is  selected  in 
relation  to  potential  outcomes,  inducing  endogeneity  or  sample  selectivity. 

The  objective  of  our  analysis  is  to  learn  about  the  marginal  distributions  or,  equivalently, 
the  conditional  quantiles  of  potential  outcomes  Yd'. 

Qyd|x(r),rGT, 

where  T  is  a  closed  subinterval  of  (0, 1). 

Quantiles  of  potential  outcomes  are  a  primary  input  to  decisions  about  the  efficiency 
of  treatment  and  social  programs3.  The  main  obstacle  to  learning  about  the  quantiles  of 
potential  outcomes  is  sample  selectivity  or  endogeneity-the  observed  (Yo,D)  are  typically 
misleading  about  the  quantities  in  question,  see  e.g.  Heckman  (1990). 

2.1.  The  Model.  The  following  model  has  been  suggested  in  Chernozhukov  and  Hansen 
(2001).  This  model  rationalizes  a  Wald-type  estimating  equation  and  justifies  a  large  variety 
of  estimators  based  on  it.  In  this  model,  the  selection  of  treatment  D  by  the  individuals  is  left 
essentially  unrestricted.  It  is  assumed  that  there  exists  a  vector  of  instrumental  variables  Z 
that  affect  the  selection  of  D  but  do  not  affect  the  potential  outcomes.  Observed  individual 
characteristics  are  denoted  by  X. 

The  main  restriction  of  the  model  is  the  similarity  assumption.  The  similarity  assumption 
states  that  conditional  on  the  information  that  led  to  an  individual's  selection  of  treatment 
state,  the  expectation  of  any  function  of  the  individual's  rank  does  not  vary  across  treat- 
ments. In  other  words,  the  selection  presumes  that  the  ex-ante  expectation  of  a  person's 
"ability" ,  as  measured  by  the  person's  rank  in  the  distribution  of  unobservables,  Ud  in  As- 
sumption Al,  relative  to  people  with  the  same  observed  characteristics  (X,Z),  does  not 
vary  across  the  treatments. 
Assumption  1.  (IV-QR  Model)  For  almost  every  value  of  {X,Z)  =  {x,z), 

Al  Potential  Outcomes.  Given  X  =  x,  for  some  Ud  ~  U(0, 1), 

Yd  =  q{d,x,  Ud), 
implying  that  q(d,x,r)  is  the  r-th  quantile  of  Yd. 


3This  has  been  discussed,  for  example,  in  Abadie  (2002). 
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A2  Selection.  For  unknown  function  S(-)  and  random  vector  V,  the  potential  treat- 
ment indexed  by  instrument  status  z,  given  X  =  x,  takes  the  form 

Dz  =  6{z,x,V). 

A3  Independence.    Given  X, 

{Ud}  is  independent  of  Z. 
A4    Similarity.  For  each  d,d',  given  (V,X,Z) 

Ud  is  equal  in  distribution  to  Ud'  - 
A5  OBSERVABLES  consist  of,  for  U  =  UD 

Y  =  q(D,X,U), 
D  =  5(Z,X,V), 
X,Z 

It  is  important  to  note  that  the  model  differs  from  the  conventional  selection  model  of 
Heckman  and  Angrist  and  Imben's  LATE.  The  differences  mainly  relate  to  the  similarity 
assumption  and  slightly  different  independence  conditions.  An  extensive  comparative  dis- 
cussion of  the  IV-QR  model  is  given  in  Chernozhukov  and  Hansen  (2001).  The  role  of  each 
assumption  is  perhaps  best  highlighted  in  the  demand  and  schooling  examples  described 
in  the  next  two  sections.  The  discussion  is  elaborate  since  these  examples  underlie  the 
empirical  analysis  in  this  paper. 

2.2.  Example:  A  Demand  Model.  The  following  is  a  general  simultaneous  equation 
model,  and  many  classical  structural  models  can  be  written  as  special  cases.  Consider  the 
following  model 

i.       Yp  =  q  (p,  U)  demand, 

ii.      Yp  =  p{p,z,U)  supply,  (2.1) 

iii.     P   £  {p  :  q  (p,  Z,U)  =  p  (p,  It)}        equilibrium. 

The  map  p  >-*  Yp  is  the  random  demand  function,  that  is  it  is  the  demand  when  the  price 
is  p.  Likewise,  p  H-  Yp  is  the  random  supply  function,  that  is  the  supply  when  the  price  is 
p.  Additionally,  Yp  and  Yp,  q(-),  and  p(-)  depend  on  the  covariates  X,  but  this  dependence 
is  suppressed.  Random  variable  U  is  the  level  of  the  demand  in  the  sense  that 

Q(P,  U)  <  q{p,  U')  when  U  <  U' . 

Demand  is  maximal  when  U  —  1  and  minimal  when  U  =  0,  holding  p  fixed.  Likewise,  It  is 
the  level  of  supply.  The  r-quantile  of  the  demand  curve  p  i-»  Yp  is  given  by 

P^QYp(T)  =  q(p,T). 

Thus  with  probability  r,  the  curve  p  ^  Yp  lies  below  the  curve  p  i->  Qyp(r). 


The  quantile  treatment  effect  is  characterized  by  an  elasticity 

q{p\r)  —  q{p,r)      or,  if  defined,  by     dlnq(p,  r)/d\np. 

The  elasticity  depends  on  the  state  of  the  demand  r  (low  or  high)  and  may  vary  consid- 
erably with  r.  For  example,  this  variation  could  arise  when  the  number  of  buyers  varies 
and  aggregation  induces  non-constant  elasticity  across  the  demand  levels  as  a  process  of 
summation  of  individual  demand  curves,  holding  the  price  fixed. 

This  example  incorporates  traditional  models  with  additive  errors 

yp  =  q{p)  +  £,  where  £  =  Qe{U).  (2.2) 

Note  that  the  model  of  demand  in  i.  is  more  general  in  that  the  price  elasticity  is  random, 
while  in  (2.2)  it  is  constant.  In  other  words,  (2.2)  restricts  the  price  effect  to  parallel  shifts 
in  demand,  while  (2.1)  allows  for  general,  non-parallel  effects. 

Condition  iii.  is  the  equilibrium  condition  that  generates  endogeneity  -  the  selection  of 
the  actual  price  by  the  market  depends  on  the  potential  demand  and  supply  outcomes  i. 
and  ii.  As  a  result 

P  =  5(Z,V), 

where  V  consists  of  U,  IX,  and  other  variables  (including  "sunspot"  variables,  if  the  equilib- 
rium price  is  not  unique). 

Instrumental  variables  Z,  like  weather  conditions  and  factor  prices,  that  shift  the  supply 
curve  and  do  not  affect  the  level  of  the  demand  curve  U  allow  identification  of  the  r  — 
th  quantile  of  the  demand  function,  p  «->■  q(p,r).  Perhaps  remarkably,  the  model  allows 
correlation  between  Z  and  V  to  exist. 

2.3.  Example:  A  Roy  type  Model.  An  individual  considers  two  levels  of  schooling 
denoted  d  —  0, 1.  The  potential  outcome  under  each  schooling  level  is  given  by 

{Yd,d  =  0,l}. 

The  individual  selects  his  schooling  level  to  maximize  his  expected  utility: 

D  =  arg  max   \e  [  W  (Yd)  \X,  Z,  V]  } 

de{°,1}  '  ,  (2-3) 

=  argdmax   {E[W  (q{d,X,Ud))\X,Z,V]j 

where  W  is  the  unobserved  Bernoulli  utility  function,  and  E  is  the  rational  expectation. 

As  a  result, 

D  =  5{Z,X,V) 

where  Z, X  are  observed,  V  is  an  error  vector  that  depends  on  ranks  {Ud}  and  other 
unobserved  variables  that  affect  the  selection,  and  5  is  an  unknown  function. 
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The  similarity  assumption  imposes  that 

E  [  W  (q(d,  X,  Ud))  \X,  Z,V]=E[W  (q(d,  X,  [/„))  \X,  Z,  V]  (2.4) 

for  any  function  W,  where  E  is  the  rational  expectation  (computed  with  respect  to  the 
true  probability  law  P).  In  other  words,  the  decision  maker's  information  does  not  allow 
the  objective  discrimination  of  systematic  variation  of  his  ranks  across  the  treatment  states, 
where  the  ranks  are  defined  relative  to  the  observationally  identical  group  of  individuals  with 
the  same  X  and  Z.  In  other  words,  the  similarity  assumption  is  nothing  but  a  restriction 
on  the  information  set  of  the  individual. 

Note  that  similarity  is  defined  relative  to  the  true  expectation.  In  effect,  (2.4)  does  not 
require  an  individual  to  have  rational  expectations  while  optimizing  in  (2.3).  Indeed,  we 
could  replace  the  expectation  operator  E  in  formula  (2.3)  by  some  subjective  expectations 
Eg  such  that 

D  =  arg  max   \eb  [  W  {q{d,X,Ud))  \X,Z,V]  }. 

3.  The  Instrumental  Variable  Quantile  Regression  Process 

3.1.  The  Principle.  We  first  describe  two  important  implications  that  arise  from  assump- 
tions A1-A4. 

Proposition  1  (A1-A5  imply  Wald  IV).  Suppose  A1-A5  and  that  r  i->  q(D,X,r)  is 
continuous  and  strictly  increasing  a.s.   Then  for  each  t  £  (0, 1) 

P[Y  <q(D,X,T)\X,Z]  =  T,  a.s.  (3.1) 

Furthermore, 

0  =  QY-q(D,x,T)\x,z{T)       a.s.  (3.2) 

Equation  (3.1)  is  a  Wald  style  restriction  that  can  be  used  to  estimate  the  quantile 
process  t  f— y  q(d,x,r).  Identification  of  the  quantile  process  in  the  population  does  not 
require  functional  form  assumptions.  This  result  is  reported  in  Chernozhukov  and  Hansen 
(2001). 

The  main  identification  restriction  (3.1)  can  be  posed  via  (3.2)  as  an  optimization  prob- 
lem, which  we  call  the  instrumental  variable  or  inverse  quantile  regression  for  its  "inverse" 
relation  to  the  (conventional)  quantile  regression  of  Koenker  and  Bassett  (1978).  This  links 
the  IV-QR  model  and  quantile  regression  together. 

Recall  from  Koenker  and  Bassett  (1978)  that  quantile  regression  is  formulated  as  finding 
the  best  predictor  of  Y  given  X  under  the  expected  loss  using  the  asymmetric  least  absolute 
deviation  criterion: 

pT(u)  —  TU+  +  (1  —  t)u~  . 


In  other  words,  assuming  integrability,  the  r-th  conditional  quantile  of  Y  given  X  solves: 

QY]x{r)  =  argmin  E[pT(Y  -f(X))\X] . 

Note  that  median  regression  is  one  important  instance  of  quantile  regression. 

Proposition  1  states  that  0  is  the  r-th  quantile  of  random  variable  Y  —  q(D,X,r)  condi- 
tional on  (X,  Z): 

0  =  QY-q{D,x,T)iT\X,  Z)      a.s.  for  each  t. 

Thus,  we  may  pose  the  problem  of  finding  a  function  q(d,  x,  r)  satisfying  equation  (3.2)  of 
Proposition  1  as  the  instrumental  variable  or  inverse  quantile  regression: 

Find  a  function  q(x,  d,  r)  such  that  0  is  a  solution  to  the  quantile  regression  of 
Y  -q{X,D,r)  on(Z,X): 

0  =  arg mm  EpT  [( Y  -  q{D,  X,  r)  -  f{X,  Z))\X,  Z) . 

3.2.  An  Instrumental  Variable  Quantile  Regression  Process.  For  estimation  of  the 
rV-QR  process,  we  focus  on  the  basic  linear  model,  which  covers  a  wide  range  of  applications: 

q{D,X,T)=D'a(T)+X'P(T),  (3.3) 

where  D  is  an  I- vector  of  treatment  variables  (possibly  interacted  with  covariates)  and  X 
is  a  fc-vector  of  (transformations)  of  covariates.  The  linear  quantile  model  is  obviously  a 
special  case  of  the  more  general  model  presented  in  Assumption  Al,  and  is  a  basic  model 
of  quantile  regression  research. 

We  focus  on  a  simple  finite-sample  analog  of  the  the  instrumental  variable  quantile  re- 
gression in  the  population.  Define  the  weighted  quantile  regression  objective  function  as 

1    - 
Q„(T,a,0,7)  =  -J2pr(ri-Dfi<*-XiP-*i{'ryi)-Vi{T),     where 

^i(r)  =  $(t,  Xi,  Zi)  is  an  /-vector  of  instruments  (/  =  dim(Z?))  , 
Vi(r)  =  V(t,  Xi,  Zi)  >  0  is  a  weight  function. 

In  principle,  we  may  consider  a  larger  number  of  elements  in  vector  $.  However,  this  is  not 
necessary  as  efficiency  can  instead  be  achieved  by  choosing  $  and  V  appropriately. 

The  IV-QR  procedure  is  as  follows:  for  \\x\\a  =  y/x'Ax 

&{t)  =  arg  jnf  j|-y(a,  t)  1]^,  such  that  (3.4) 

(/8(a,r),7(a,T))=    arginf     Qn(r, 0,^,7),  /3  5^ 

(/?,7)eBxS  v  '  ; 


where  A  and  "B  are  parameter  sets,  3  is  any  fixed  compact  cube  centered  at  0,  and  A  is  a 
positive  definite  matrix. 

The  parameter  estimates  are  given  by: 

(fi(r)J(T))  =  (fi(r)^(a(T),r))  (3.6) 

Let  us  denote 

9{t)  =  (a(r)',/3(r)')'  and  9{r)  =  (a(r)'^(r)')'- 

There  are  three  principal  motivations  for  this  estimator.  First,  it  naturally  links  IV  re- 
strictions and  conventional  quantile  regression  together  by  exploiting  the  principle  described 
in  the  previous  section.  Second,  it  is  computationally  convenient,  since  the  estimates  can 
be  computed  by  implementing  a  series  of  ordinary  quantile  regressions  (convex  optimiza- 
tion problems)  implying  a  need  for  a  grid  search  only  over  the  a-parameter.  Third,  it  is 
asymptotically  equivalent  to  a  GMM  estimator  and  achieves  maximal  efficiency  by  choosing 
instruments  $  and  weights  V  appropriately. 

The  instrumental  variable  quantile  regression  process  is  defined  as 

§(-)  =  (§{r),    tGT), 
where  T  is  a  closed  subinterval  of  (0, 1). 

3.3.  Assumptions.  In  order  to  obtain  properties  of  the  IV-QR  process,  we  first  impose  a 
set  of  simple  regularity  conditions. 

Assumption  2  (Conditions  for  Estimation).  In  addition  to  (3.3),  suppose 

Rl  Sampling.    (Y$,  Di,  Xi,  Z\)  are  iid  defined  on  the  probability  space  (Q,F,P)  and 

take  values  in  a  compact  set. 
R2  Compactness.  For  all  t,  (a(T),/3(T))  G  int  A  x  S,  A  x  3  is  compact  and  convex. 
R3  Full  Rank  and  Continuity,  a.s.  supyeR  fY\(X,D,z){y)  <  K  and 

d(a'dp\7')Ep  [l(y  <  D'a + x'p + ^y^^ '  ^(t)  - v^  ■  W  :  **(T>T 

has  full  rank,  and  is  continuous  in  (a,  ft,  7,  r)  uniformly  over  A  x  "B  x  3  x  T. 
R4  Estimated  Instruments  and  Weights.  Wp  ->  1,  $(t,z,x), V(t,z,x)  g  5"  and 
V(t,  z,  x)  — >■  V(r,  z,  x)  G  3",  $(t,  z,  x)  — >  $(r,  x,  z)  G  3"  uniformly  in  (r,  z,  x)  over 
compact  sets.  For  all  r,  functions  3":  (t,z,x)  t-¥  /(r,  z,x)  are  uniformly  smooth 
functions  in  (z,x),  C^-,4  with  the  uniform  smoothness  order  77  >  dim(d,  z,x)/2, 
and  /  are  uniformly  Holder  in  r:  ||/(r',z,x)  —  /(r,z,  x)||  <  C|r 
constants  C  >  0,  and  0  <  a  <  1  are  independent  of  (z,x,t,  t'). 


See  page  154  in  van  der  Vaart  and  Wellner  (1996). 


Condition  Rl  imposes  iid  sampling  and  compactness  on  the  support  of  the  economic  vari- 
ables. The  compactness  is  hardly  restrictive  in  micro-econometric  applications,  but  it  can  be 
relaxed.  Condition  R2  imposes  compactness  on  the  parameter  space.  Such  an  assumption  is 
needed  at  least  for  the  parameter  a(r)  since  the  objective  function  is  not  convex  in  a.  The 
full  rank  condition  in  R3  implies  global  identification,  and  the  continuity  condition  in  R3 
together  with  Rl  suffices  for  asymptotic  normality.  The  parametric  identification  condition 
is  similar  in  spirit  to  the  nonparametric  identification  conditions  obtained  by  Chernozhukov 
and  Hansen  (2001)  and  is  of  independent  interest.  Essentially,  this  condition  requires  that 
the  instrument  $  impacts  the  joint  distribution  of  (Y,  D)  at  many  relevant  points.  Clearly, 
conditions  R1-R4  may  be  refined  at  a  cost  of  more  complicated  notation  and  proof. 

The  role  of  R4  is  to  allow  possibly  estimated  instruments  and  weights.  Smoothness  in  R4 
needs  to  hold  only  for  the  non-discrete  sub-component  of  (d,  x,  z).  Condition  R4  allows  for  a 
wide  variety  of  nonparametric  and  parametric  estimators,  as  shown  by  Andrews  (1994).  See 
also  Andrews  (1995),  Newey  (1990,  1997),  and  Newey  and  Powell  (1990)  for  other  examples 
of  such  estimators.  The  smoothness  condition  in  R4  can  be  replaced  by  a  more  general 
condition  of  3"  having  a  finite  L2(-P)-bracketing  entropy  integral. 

3.4.  Estimation  Theory.  Theorem  1  describes  the  distribution  of  the  IV-QR  process. 
Theorem  1  (IV-QR  Process).  Given  Assumptions  1-2,  for  £j(r)  =  Y{  —  D^a^)  +  X[P{t) 

=>  b{r)  in  f°°{7), 

uniformly  in  r,  where 

/J(r,0(r))  =  (r-l(ei(r)<O)) 
and  b(-)  is  a  mean  zero  Gaussian  process  with  covariance  function: 

E  b{r)b{r')  =  J{r)-'S{ry)[J{T')-1] ', 
where 

J(t)  =  E  [f<T)(0\X,D,Z)V(T)[D'  :  X'}}  ,  S(t,t')  =  (min(r,r')  -  TT')E^{r)^{T')' '. 

Theorem  1  simply  states  that  #(•)  is  approximately  distributed  as  a  continuous  random 
Gaussian  function.  This  implies  a  variety  of  useful  results. 

Corollary  1  (Normality).  For  any  finite  collection  of  quantile  indices  {tj,j  G  J} 

{^(e(Tj)  -  0(Tj)) }^  A  n(o, { Jinr'sin^iJin)-1]'}^). 


Corollary  2  (Efficient  Weights  and  Instruments).  When  we  choose  the  weights  V*(t)  = 
/e(r)(0|X,Z),  v(t)  =  ft{T)(0\D,X,Z),  <F(r)  =  E[Dv(t)\X,Z]/V*(t),  and^(r)  =  V*{r)[X'  : 
$*(r)],  the  covariance  function  equals 

E  b{T)b{r')'  =  (mm(r,r')  -  tt')  -  [**(t)**(t')']_1- 

This  choice  of  instruments  and  weights  leads  to  an  efficient  procedure.  This  can  be  shown 
by  appealing  to  Chamer Iain's  (1986)  arguments.  Regularity  condition  R4  allows  use  of  a 
wide  variety  of  nonparametric  estimators  and  parametric  approximations  of  the  optimal  <3> 
and  V.  For  particular  examples  of  such  procedures,  see  Amemiya  (1977),  Andrews  (1994, 
1995),  Newey  (1990,  1997),  and  Newey  and  Powell  (1990).  An  example  of  a  simple  and 
practical  strategy  for  empirical  work  is  to  construct  $  as  an  OLS  projection  of  D  on  X  and 
Z  (and  possibly  their  powers)  and  set  Vj  =  1. 

Corollary  3  (Distribution-Free  Limits).  ForW(r)  =  r(l-r) J(t)-1£?*(t)*(t)'[J(t)-1]' 


W(t)-*V^(0(t)-0(t))=»Bp(t), 


where  Bp  is  a  standard  p- dimensional  Brownian  bridge  Bp  (p  =  I  +  k)  with  covariance 
operator 

E  Bp(t)Bp(t')  =  (min(r,r')  -  tt')Ip. 

4.  Inference 

Several  distributional  hypotheses  have  been  posed  in  the  fundamental  econometric  and 
statistical  literature.  For  example,  an  essential  hypothesis  is  whether  the  treatment  exhibits 
a  pure  location  (constant  treatment)  effect  or  a  general  shape  effect,  e.g.  Doksum  (1974)  and 
Koenker  and  Xiao  (2002),  or  whether  the  treatment  creates  a  stochastic  dominance  effects, 
cf.  Abadie  (2002),  Heckman  and  Smith  (1997),  and  McFadden  (1989).  In  structural  and 
causal  empirical  analysis,  the  hypothesis  of  endogeneity  is  also  fundamental,  motivating 
Hausman  tests,  cf.  Hausman  (1978).  In  this  section,  we  describe  inference  procedures  to 
test  these  hypotheses. 


4.1.  The  Inference  Problem.  All  of  our  hypotheses  will  be  embedded  in  the  following 
null  hypothesis: 

B.(t)(6„(t)  -  r„(r))  =  0,       for  each   r  e  7,  (4.1) 

where  R(t)  denotes  a  known  q  x  p  matrix,  q  <  p  =  dim(#)  and  r£f.  The  parameter 
#„(-)  is  made  explicitly  dependent  on  the  sample  size  n  to  accomodate  asymptotic  analysis 
under  local  alternatives. 
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The  tests  will  be  based  on  the  instrumental  variable  quantile  regression  process,  §(■).  We 
will  focus  on  the  basic  inference  process, 

u„(T)=i?(r)(4(r)-rn(r)),  (4.2) 

and  statistics  of  the  form  Sn  =  f(^/nvn(-))  derived  from  it.  In  particular,  we  will  be  inter- 
ested in  the  Kolmogorov-Smirnov  (KS)  and  Smirnov-Cramer-Von-Misses  (CM)  statistics, 
which  have 

Sn  =  v/nsup||?;n(T)||A(T),     Sn  =  n      |K(r)||?(r)dT,  (4.3) 

rST  J-J 

respectively,  where  j| a||^  =  %/a'Aa,  the  symmetric  A(t)  — >  A(r)  uniformly  in  r,  and  A(r) 
is  a  positive  definite  symmetric  matrix  uniformly  in  t.  The  choice  of  A  and  A  is  discussed 
in  Sections  4.3  and  4.5. 

Example  1  (Hypothesis  of  Equality  of  Distributions).  A  basic  hypothesis  is  that  the 
treatment  impacts  the  outcomes  significantly: 

a(r)  ^  0  for  some  r  in  7. 

In  this  case, 

R(T)  =R=  [1,0, ...]  and  r(r)  =  0. 

The  next  example  imposes  a  restrictive,  yet  simple  mechanism  through  which  the  treat- 
ment may  operate.  This  mechanism  requires  that  the  effect  is  constant  across  the  distribu- 
tion. The  alternative  is  that  the  effect  varies  across  quantiles.  The  alternative  hypothesis 
of  heterogeneous  treatment  effects  is  of  fundamental  importance  because  it  motivates  the 
modern  causal  models,  see  e.g.  Heckman  (1990),  which  were  developed  specifically  to  cope 
with  varying  effects. 

Example  2  (Location- Shift  or  Constant  Effect  Hypothesis).  The  hypothesis  of  a 
constant  treatment  effect  is  that  the  treatment  D  affects  only  the  location  of  outcome  Y, 
but  not  any  other  moments.  That  is, 

3a  :  q(t)  =  a,  for  each  t£J. 

In  this  case, 

R(T)  =R=  [1,0...]  and  r(r)  =r=  («,/?)',  implying 

Rr  =  a, 

which  asserts  that  the  a(r)  is  constant  across  all  r  G  T.  The  component  r  can  be  estimated 
by  any  method  consistent  with  the  null,  e.g.  f  =  (a(^)' ', $(\)')' '■ 

Example  3  (Dominance  Hypothesis).  The  test  of  stochastic  dominance,  or  whether  the 
treatment  is  unambiguously  beneficial,  involves  the  dominance  null 

a(r)  >  0,  for  all  r  €  7 

li 


versus  the  non-dominance  alternative 

a(r)  <  0,  for  some  r  £  T. 
In  this  case,  the  least  favorable  null  involves 

R(T)  =  B,  =  [1,0...]  and  r(r)  =  0, 
and  one  may  use  the  one-sided  KS  or  CM  statistics,  cf.  Abadie  (2002), 

Sn  —  \/nsupmax(— a(-r),0),  and  Sn  —  n  /  ||  max(— a(r),0)||^(T)dr 

to  test  the  hypothesis. 

Example  4  (Exogeneity  Hypothesis).  In  a  basic  model,  the  quantiles  of  potential  or 

counterfactual  outcomes,  conditional  on  X,  are  given  by 

QYd\x(T)  =  d'a(r)+X'f3(T). 

Suppose  that  the  treatment  D  is  chosen  independently  of  outcomes,  that  is  D  is  independent 
of  {Ud},  conditional  on  X.  Then  the  quantiles  of  realized  outcome  Y,  conditional  on  D  and 
X,  are  given  by 

<?y|D;x(T)  =  £'a(T)  +  *'/?(r). 

Thus,  in  the  absence  of  endogeneity,  (q:(t)',/3(t)')'  can  be  estimated  using  the  conventional 
quantile  regression  without  instrumenting.  The  difference  between  IV-QR  estimates,  #(-), 
and  QR  estimates,  $(-),  can  be  used  to  formulate  a  Hausman  test  of  the  null  hypothesis  of 
no  endogeneity: 

q:(t)=i9(t)1    for  each  r  in  7,   where  $(r)  =  plim  i?(r),  (4.4) 

and  i9(-)i  is  the  QR  estimate  of  a(-)  obtained  without  instrumenting.  In  this  case, 

R(t)  =  [1, 0, ...],    and    r{r)  =  0(t). 
The  alternative  of  endogeneity  states: 

3t£T:q(t)^i?(t)1.  (4.5) 

4.2.  The  Assumptions.  We  will  maintain  the  following  technical  assumptions. 
Assumption  3  (Conditions  for  Inference). 

1.1  (Yj,  Z?j,  X i,  Zi)  are  iid  on  the  probability  space  (O,  F,  Pn).  The  law  of  (Yt,  Dt,  Zt,  Xt,  t  < 
n),  P„\  is  contiguous  to  some  P["),5  and  either 

(a)  for  a  fixed  continuous  function  p(r)  :  7  — >  W  and  for  each  n 

R(t)  (9ti{t)  -  rn(r))  =  g(r),    g{r)  =  p(T)/y/n,  or, 

(b)  for  a  fixed  continuous  function  g(r)  :  7  — ¥  K9  and  for  each  n 

R(r)(9(T)-r(T))=g(r). 


5Contiguity  is  defined  as  on  p.  87  in  van  der  Vaart  (1998) 
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Functions  -R(r),  g(r),  p(r),  and  limn  r„(-r)  are  continuous  in  t. 
1.2    (a)  Under  any  local  alternative,  12(a) 

^(r(-)-r«(-))=*  <*(-), 

jointly  in  £°°(T),  where  b(-)  and  <f(-)  are  jointly  zero  mean  Gaussian  functions, 
(b)  Under  the  global  alternative,  12(b),   the  same  holds,  except  that  the  limit 
(&(-),  d(-))  needs  not  have  the  same  distribution  as  in  12(a). 

Conditions  1.1(a)  and  1.1(b)  formulate  a  local  and  a  global  alternative. 

Condition  1.2  requires  that  the  estimates  of  6(-)  and  r(-)  are  asymptotically  Gaussian. 
In  our  examples,  this  is  guaranteed  by  Theorem  1  and  asymptotic  results  for  conventional 
QR.  Note  that  1.2  is  formulated  so  that  other  asymptotically  Gaussian  estimators  of  the 
parameters  of  the  IV-QR  model  are  permitted.  Detailed  discussion  of  this  assumption  is 
stated  after  Assumption  4. 

4.3.  Inference  Theory.  We  are  now  prepared  to  state  a  main  result. 

Theorem  2  (Inference).  For  f  denoting  the  two-  and  one-  sided  KS  or  CM  statistics 

1.  Under  Assumptions  1,  2,  3:11  (a),  and  3:I2(a)  in  i°°{7) 

Sn^S  =  f(v0(-)+p(-)), 

where  vq(t)  =  7?(t)(6(t)  —  d(r)).  If  vo(-)  has  non-generate  covariance  kernel,  when 
the  null  is  true  (p  =  0),  we  have  for  a  <  1/2 

Pn(Sn  >  c(l  -a))^a  =  JP(/(«d(-))  >  c(l  -  a)), 

and  when  the  null  is  not  true  (p  ^  0), 

Pn{Sn  >  c(l  -  a))  -5-  /?  =  P(/(u0(-)  +P(0)  >  c(l  -  a))  >  a. 

2.  Under  Assumptions  1,  2,  3:11  (b),  and  3:I2(b), 

Sn  -A  oo,      Pn(Sn  >  c(l  -  a))  -*■  1. 

Theorem  2  states  the  limit  distribution  of  the  KS  and  CM  statistics  under  local  alter- 
natives and  the  global  alternative.  Note  that  in  the  statement  of  Theorem  2  we  implicitly 
assume  that  for  the  case  of  one-sided  tests  in  Example  3,  the  local  or  global  alternatives  to 
the  least  favorable  null  violate  the  composite  null. 

As  it  stands,  Theorem  2  does  not  provide  us  with  operational  tests,  since  we  do  not  know 
the  critical  value 

c(l-a):P(/(«o(-))>c(l--a))  =  a. 

In  general,  one  faces  the  Durbin  problem  when  estimating  c(l  —  a)  since  the  limit  is  generally 
not-distribution  free  and  simulations  are  infeasible. 
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In  several  important  cases,  such  as  Examples  1  and  3,  the  Durbin  component  d(-)  is 
equal  to  zero  and  so  is  not  present.  In  these  cases,  it  is  possible  by  picking  an  appropriate 
weight  matrix  A(r)  to  achieve  asymptotically  distribution-free  inference,  at  least  under  iid 
sampling. 

Corollary  4  (Distribution-Free  Inference).  Suppose  d(-)  =  0.  If  we  pick  matrix  A(r)  = 
[R(t)W{t)R{t)}~1  with  W(t)  =  t(1  -  r)J(r)-1£;[*(r)*(T)'][J(r)-1]'  to  enter  the  norm 
||  -  ||a(t)  in  the  definition  of  the  KS  and  CM  statistics  f  in  (4-3),  and  suppose  we  have 
A(t)  =  A(r)  +  Op(l)  uniformly  in  r,  then 

/K(-))  =>  /(£,(■)), 

where  Bq  is  the  standard  q-dimensional  Brownian  bridge  with  covariance  function: 

EBq{r)Bq{T)'  =  (min(r,r')  -rr')/?- 

In  other  important  cases,  such  as  Examples  2  and  4,  the  transformation  used  in  Corollary 
4  will  not  provide  distribution- free  limits.  There  are  several  ways  to  proceed.  One  method  is 
a  martinagale  transformation  using  Khmaladzation,  cf.  Koenker  and  Xiao  (2002).  Another 
method  is  a  simple  resampling  with  recentering  the  inference  process  around  its  sample  re- 
alization, cf.  Chernozhukov  (2002) .  Simulation  results  in  Chernozhukov  (2002)  suggest  that 
resampling  has  an  accurate  size  and  somewhat  better  power  than  Khmaladzation.  In  the 
next  section,  we  describe  a  different  resampling  method  that  delivers  the  same  asymptotic 
quality  and  is  attractive  computationally. 

4.4.  Inference  by  Resampling.  The  method  of  resampling  we  suggest  in  this  paper  does 
not  require  the  recomputation  of  the  estimates  over  the  resampling  steps,  which  may  be  quite 
laborious  since  the  optimization  problem  requires  many  computations  of  ordinary  quantile 
regressions  for  many  values  of  a  and  r.  Instead  we  resample  the  linear  approximation  of  the 
empirical  inference  processes.  In  addition,  to  facilitate  a  feasible,  practical  implementation, 
we  employ  the  m  out  of  n  bootstrap  (subsampling).6 

Suppose  that  we  have  a  linear  representation  for  the  inference  process: 

1      " 
v^n(-)  -  V^g(-)  =  ~r22Zi^  +  °^(1)'  (4-6) 

vni=i 

where  Zj(-)  is  defined  below  in  Proposition  2. 

Given  a  sample  of  the  estimated  scores  (estimation  is  discussed  below), 

{zi(r),i  <n,r  e  T}, 
consider  the  following  steps. 


The  full  bootstrap  is  hardly  feasible  in  our  second  empirical  example,  even  though  we  are  bootstrapping 
a  linear  statistic. 
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Step  1.  Construct  all  subsets  I  of  {j  <  n}  of  size  b.  Denote  such  subsets  as  J,-,i  <  Bn. 

The  number  of  such  subsets  Bn  is  "n  choose  b."7 

Denote  by  Vjtb,n  the  inference  process  computed  over  the  j'-th  subset  of  data  Ij, 

Vj,b,n(T)  =  l^2*(T)> 

and  define  Sj>bjJl  =  f{Vb[vj;b,n{-)])  as 

Sj^n  =  SUpVb\\vjAn{T)\\y(T)  Or  Sj^n  =  &  /    \\vj,b,n(T)  III Mrfr> 

for  cases  when  5„  is  the  Kolomogorov-Smimov  (KS)  or  Smirnov-Cramer-Von-Misses  (CM) 
statistic,  respectively.  Define  for  S  =  f(vo(-)) 

T(x)  =  Pr{S  <  x}. 

Step  2.  Estimate  T(x)  by 

Bn 
fM(s)  =  B"1  £  l{5,"An(T)  <  *>- 

Step  3.  The  critical  value  is  obtained  as  the  1  —  a-th  quantile  of  i\n(-): 

Cb,n(l  -<*)  =  ffc^C1  -  Q)  =  'miiC  ■■  Tb,n(c)  >  1  "  «}- 

The  size  a  test  rejects  the  null  hypothesis  when  Sn  >  q,jn(l  —  a). 

In  obtaining  the  linear  representation  (4.6),  we  make  use  of  the  following  assumption. 
Assumption  4  (Linear  Representations).  In  addition  to  1.1  and  1.2 

1.3    (a)  Under  any  local  alternative,  12(a),  there  exist  sums  of  iid  mean  zero  vectors 
such  that  uniformly  in  t  in  T 

1       - 
V^  (£(-)  -  MO)  =  Arl-7=  £>(-A(0)*i(0  +  <*.(!)  =*   KO, 

Vn(r(-)  -  r„(-))  =  ff(0_1-^E*(-'r»(0)T«(0  +  oPn(l)  =*  d(0, 

jointly  in  £°°(7),  where  b(-)  and  d(-)  are  jointly  zero  mean  Gaussian  functions, 
(b)  Under  the  global  alternative,  12(b),   the  same  holds,  except  that  the  limit 
(&(-),  d(-))  needs  not  have  the  same  distribution  as  in  13(a). 


A  smaller  number  Bn  of  randomly  chosen  subsets  can  also  be  used,  if  B„  — >  oo  as  n  — >  oo,  cf.  Section 
2.5  in  Politis,  Romano,  and  Wolf  (1999).  The  subsampling  is  done  with  replacement.  However,  if  b2/n  — ►  0, 
the  subsampling  without  and  subsampling  with  replacement  are  equivalent  wp  — >  1. 
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1.4  (a)  We  have  the  estimates  r  1-4  Zj(t,0(t))*j(t)  and  r  1-4  cZj(t,  f(T))Tj(-r)  that 
take  realizations  in  a  Donsker  class  of  functions  with  a  constant  envelope  and 
are  uniformly  consistent  in  r  under  the  ^(-P)  semimetric.8 

(b)  Wp-^  1  for  each  i:  EPnU(T,On(T))fi{T)\f=^  =  0,EPndi(T,rn{T))fi(T)\f=^  =  0. 

(c)  Functions  Z,  and  d,  are  L2(-P)-Lipschitz  uniformly  in  r:  [i£j,||/j(-r,  6)  —  ^(r,^')!!2]1 
<  C  \\e'-6\\,    [EP\\di(r,r)  -  di(T,r')fy/2  <  C\\r'-r\\,  uniformly  in  {0,6',  r,r') 
over  compact  sets. 

Proposition  2  (Linear  Representations).   Under  Assumption  3-4  we  have: 

1      " 
vW(-)  -  y/ng(-)  =  -7=J2  «*(■)  +  °p(1),  (4-7) 

where 

Zi(-)  =  R(.)  [J(-)-1I«(-,e(-))*i(-)  -  ff(-)"1^(-,r-(-)Ti(-)]  • 

Thus,  the  estimate  of  Zi(-)  is  given  by 

Zi(T)=R(r)  [jM-'Mr.flWlf.-W-ffM-^^fMjtilr)]  , 

where  J(t)  and  -ff(r)  are  any  uniformly  consistent  estimates. 

In  Assumption  4  condition  1.3  requires  that  the  estimates  of  0  and  r  entering  the  null 
hypotheses  have  asymptotically  linear  representation  in  the  form  defined  above,  and  as- 
ymptotic normality  applies  to  these  estimates.  Note  that  1.3  is  formulated  so  that  other 
asymptotically  Gaussian  estimators  of  the  IV-QR  model  are  permitted.  Conditions  1.4(a) 
and  1.4(c)  impose  a  smoothness  needed  for  developing  the  theory  of  the  resampling  infer- 
ence. These  condition  are  also  satisfied  in  all  of  the  examples  considered  in  this  paper. 
Condition  1.4(b)  is  the  condition  of  "orthogonality"  which  implies  that  the  estimation  of 
weights  $;  and  T,  has  no  effect  on  the  asymptotic  distribution  of  the  linear  representation 
in  1.3. 

Proposition  3  in  Appendix  C  formally  verifies  that  1.3  and  1.4  are  satisfied  for  the  par- 
ticular implementations  that  we  propose.  Next,  we  briefly  go  through  examples  and  state 
the  scores  Zi  for  each  of  them. 

1.  Test  of  Equality  of  Distributions:  Since  r  =  0  is  not  estimated, 

zi(T)=R(r)[j(r)-1li(r,e(T))^i(r)}, 
where  U{t,6{t))  =  (r  -  l(Yt  <  A«(r)  +  Xtffr))) ,  ^(r)  =  V^t) [$*(•/-) ',*']' 

2.  Test  of  Constant  Effect:  In  this  case,  f  (•)  =  #(|)  is  an  IV-QR  estimate,  and  for 
li(-)  defined  above 

*i(T) = r(t)  [/(tj-^mo-))*^)  -  jur^u^u))^))] , 

8/(W,r)  is  consistent  to  f(W,r)  under  L2(P)  if  supT  Vax  [/(w,r)  -  /(w,r)l       .  -A  0. 
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3.    Test  of  Dominance  Effect:     Since  r  =  0,  the  score  is 

Zl(T)=B,(T)[j(rr1h(T,e(T))^l(T)}, 

4-  Test  of  Exogeneity:  If  r  is  estimated  using  conventional  quantile  regression  as 
defined  in  Example  4,  the  score  is  given  by 

Zi(r)  =  R(t)  [JW-^jfr.OWl^W  -  iKr)"^^^)]  , 

where  dj(T,^(T))  =  (r  -  l(Yi  <  X^(r))Ai,  Xt  =  (JD^)'   ff(r)  =  £/y|*(t?(t)'X)1X'. 

Estimation  of  i/  and  J  matrices  is  further  discussed  in  section  4.5.2.  The  next  Theorem 
3  establishes  the  properties  of  the  proposed  resampling  method. 

Theorem  3  (Resampling  Inference).  Suppose  Assumptions  2-3,  and  that  we  have  J(t)  — ^ 
J(t)  and  H(r)  — ^>  H(t)  uniformly  in  r  over  T.  Then  as  b/n  — >  0,  b  — ¥  oo,  n  — >  oo 

1.  When  the  null  is  true  (p  —  0),  ifT  is  continuous  at  r~](l  —  a): 

0^(1  -  a)  -2=>  r-^l  -  a),      Pn(5„  >  c>6(l  -  a))  -5-  a. 

2.  Under  local  alternative  A2a  (p  ^  0^,  i/T  is  continuous  at  r_1(l  —  a): 

cnj.il  -  a)  -*=>  r-!(l  -  a),      Pn(Sn  >  cnib(l  -  a))  -►  /?, 

w/iere  /3  =  Pr(/(w0(-)  +  p(-))  >  r_1(l  -a))- 

3.  Under  global  alternative  A 2b,  Sn  — ^>  oo: 

Cn^l  -  a)  =  Op(l),      Pn(Sn  >  c^l  -  a))  ->  1. 

4.  r(:r)  zs  absolutely  continuous  at  x  >  0  wAen  i/ie  covariance  function  of  v  is  nonde- 
generate  a.e.  in  r. 

Thus  the  resampling  mechanism  consistently  estimates  the  critical  values,  and  the  re- 
sampling tests  are  asymptotically  unbiased  and  have  the  same  power  as  the  corresponding 
test  in  Theorem  2  that  uses  a  known  critical  value.  Thus,  the  test  are  consistent  and  have 
non-trivial  power  against  the  1/^/n-local  alternatives,  as  discussed  after  Theorem  2. 

4.5.  Practical  Implementation.  Here  we  discuss  practical  implementation  of  the  resam- 
pling inference. 

4.5.1.  Discretization.  It  is  more  practical  to  use  a  grid  7n  in  place  of  7  with  the  largest  cell 
size  5n  —¥  0  as  n  — \  oo. 

Corollary  5.    Theorems  1-3  are  valid  for  piece-wise  constant  approximations  of  the  finite- 
sample  processes  using  7n  ,  given  that  Sn  — >  0  as  n  —>■  oo. 
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4.5.2.    Choice  and  Estimation  of  A(t).  J(t)  H{t).  In  order  to  increase  the  test's  power  we 
could  set 

A*(T)  =  [fi*(T)]-1=Var[^(r)]-1, 

which  is  a  (generalized)  Anderson-Darling  weight.9  In  hd  samples,  there  are  many  methods 
for  estimating  A*(r),  uniformly  consistently  in  r. 

An  obvious  and  uniformly  consistent  by  13-4  estimate  of  f2*(r).is  given  by: 


n  f— ' 

i=l 

Zi(T)=R(r)  [j(;)-1I,(T,«(T))*i(r)-5(7)-1<ij(T,f(r))fi(r)]  . 
A  uniformly  consistent  estimate  of  J(t)  is  given  by  Powell's  (1986)  estimator, 
J(r)  =  -Y,Kh(Yi  -  D'Mt)  -  X#(T))*i(T)[A,.Xi], 


n 


where  Kh{x)  =  h~lK(x/h)  and  K(-)  is  a  compactly  supported  symmetric  kernel  with 
two  uniformly  bounded  derivatives,  and  h  ~  Cn~l'b.  Estimates  of  H  are  only  needed  in 
Examples  2  and  4.  In  Example  2, 

and  in  Example  4,  a  uniformly  consistent  estimate  of  H(r)  is  given  by  Powell's  estimator: 

1    " 

£=1 

4.5.3.  Choice  of  the  Block  Size.  In  Politis,  Romano,  and  Wolf  (1999)  various  rules  are  sug- 
gested for  choosing  an  appropriate  subsample  size,  including  the  calibration  and  minimum 
volatility  methods.  The  calibration  method  involves  picking  the  optimal  block  size  and  ap- 
propriate critical  values  on  the  basis  of  simulation  experiments  conducted  with  a  model  that 
approximates  the  situation  at  hand.  The  minimum  volatility  method  involves  picking  (or 
combining)  among  the  block  sizes  that  yield  the  most  stable  critical  values.  More  detailed 
suggestions  emerge  from  Sakov  and  Bickel  (1999)  who  suggest  that  choosing  b  =  fen2/5 
yields  optimal  performance  for  a  related  subsampling  method  that  involves  recomputing 
the  estimates.  In  our  setting,  this  choice  also  appears  reasonable.  Our  experiments  and 
those  in  Chernozhukov  (2002)  indicate  that  selecting  k  between  3  and  10  are  attractive  both 
computationally  and  qualitatively. 


This  choice  is  not  readily  suited  to  Example  2,  since  Var  z,(j)  =  0.  However,  we  can  cut  out  [J  —  e,  %  +  e] 
from  the  interval  T.  Alternatively,  one  may  always  simply  use  A(t)  =  /. 
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5.  Empirical  Analysis 

This  section  contains  results  from  two  empirical  examples  that  illustrate  the  use  of  the 
inference  procedures  presented  in  this  paper.  The  examples  correspond  to  the  models 
outlined  in  Sections  2.2  and  2.3.  In  the  first  example,  we  estimate  the  market  demand  for 
fish,  and  in  the  second,  we  consider  the  returns  to  schooling. 

5.1.  The  Structure  of  Demand  for  Fish.  In  this  section,  we  present  estimates  of  demand 
elasticities  which  may  potentially  vary  with  the  level  of  demand,  r,  and  assess  the  impact 
of  price  on  the  quantity  distribution.  The  data  contain  observations  on  price  and  quantity 
of  fresh  whiting  sold  in  the  Fulton  fish  market  in  New  York  over  the  five  month  period 
from  December  2,  1991  to  May  8,  1992.  These  data  were  used  previously  in  Graddy  (1995) 
to  test  for  imperfect  competition  in  the  market  and  later  in  Angrist  and  Imbens  (2000)  to 
illustrate  the  use  of  the  conventional  IV  estimator  as  a  weighted  average  of  heterogeneous 
demands.  The  price  and  quantity  data  are  aggregated  by  day,  with  the  price  measured  as 
the  average  daily  price  for  the  dealer  and  the  quantity  as  the  total  amount  of  fish  sold  that 
day.  The  data  also  contain  information  on  the  day  of  the  week  of  each  observation  and 
variables  indicating  weather  conditions  at  sea,  which  are  used  as  instruments  to  identify  the 
demand  equation.  The  total  sample  consists  of  111  observations  for  the  days  in  which  the 
market  was  open  during  the  sample  period. 

The  demand  function  we  estimate  takes  a  standard  Cobb-Douglas  form: 

Qin{Yp)\x(r)  =  A(r)  +  a(T)ln(p), 

where  Yp  is  the  demand  when  price  is  set  at  p.  The  elasticity  a(r)  is  allowed  to  vary  across 
the  quantiles,  r,  of  the  demand  level.  Note  that  this  is  a  Cobb-Douglas  version  of  the 
demand  model  with  random  elasticity  discussed  in  Section  2.2. 

The  left  panel  of  Figure  1  provides  the  estimates  of  elasticities  obtained  by  IV-QR  of 
ln(F)  on  ln(P)  using  wind  speed  as  the  instrumental  variable,  while  the  right  panel  depicts 
standard  quantile  regression  (QR)  estimates  of  the  effect  of  ln(P)  on  ln(Y).  The  shaded 
region  in  each  figure  represents  the  90%  confidence  interval.  The  estimated  model  does  not 
include  covariates,  but  the  estimated  elasticities  are  not  sensitive  to  the  inclusion  of  controls 
for  days  of  the  week  or  other  covariates. 

In  general,  the  point  estimates  obtained  through  IV-QR  differ  substantially  from  the 
corresponding  QR  estimates.  The  QR  estimates  appear  to  be  approximately  constant  across 
the  entire  range  of  the  quantity  distribution  and  are  uniformly  small  in  magnitude.  The  IV- 
QR  estimates,  on  the  other  hand,  demonstrate  a  great  deal  of  variability,  ranging  from  near 
-2  at  low  quantiles  to  -0.5  in  the  upper  end  of  the  distribution.  Except  at  high  quantiles, 
the  IV-QR  estimates  of  the  elasticities  are  uniformly  greater  in  magnitude  than  the  price 
effects  predicted  by  QR,  demonstrating  an  upward  bias  induced  by  the  joint  determination 
of  price  and  quantity  in  the  market.  Also  note  that,  like  2SLS  and  OLS,  the  interpretation 
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TABLE  1 .  The  test  results  for  Demand  Equation,  using  b 
with  replacement  ) 


50  (  subsampling 


Null  Hypothesis 

Alternative  Hypothesis 

P-value  for  CM  Test  Statistic 

No  Effect,  q(-)  =  0 

Non-zero  Effect 

<.01 

Fixed  Elast.  a(-)  =  a 

Random  Elasticity 

.28 

Dominance   a(-)  <  0 

Non-Dominance 

.50 

Exogeneity   a(-)  =  aQ1 

»(■) 

Endogeneity 

.18 

of  the  IV-QR  and  QR  estimates  are  very  different.  IV-QR  estimates  a  demand  model, 
while  QR  estimates  the  conditional  quantiles  of  the  equilibrium  quantity  as  a  function  of 
the  equilibrium  price. 

Results  from  formal  tests  of  the  location-shift  hypothesis  and  the  endogeneity  hypotheses 
are  contained  in  Table  1.  We  fail  to  reject  the  dominance  hypothesis  that  a(-)  <  0,  indicating 
downward  sloping  demand  at  all  quantiles.  The  rest  of  the  results  require  careful  discussion 
in  view  of  the  small  sample  n  =  111.  Due  to  the  simultaneous  determination  of  price  and 
quantity,  as  expected  there  is  clear  evidence  against  the  null  of  exogeneity.  In  particular,  we 
reject  the  null  of  no-endogeneity  at  the  10%  level  (recall  that  variances  cancel  each  other  for 
Hausman  tests)  for  the  lower  quartile  (t  =  .25)  of  demand.  However,  the  overall  results  are 
weaker,  giving  p- values  of  only  18%.  The  hypothesis  of  constant  elasticity  is  also  rejected 
at  the  10%  level  when  we  compare  the  lower  quartile  elasticity  with  the  median  elasticity. 
The  overall  results  are  weaker  yielding  only  a  28%  p-value.  These  results  indicate  that  the 
elasticity  of  demand  is  likely  heterogeneous. 


5.2.  The  Structure  of  the  Returns  to  Schooling.  As  a  further  illustration  of  the  use 
of  the  estimation  and  inference  methods  presented  in  this  paper,  we  use  the  data  and 
methodology  employed  in  Angrist  and  Krueger  (1991)  to  estimate  the  effects  of  schooling 
on  earnings.  In  particular,  we  estimate  linear  conditional  quantile  models  of  the  form 

Qln(Ys)\x(r)  =  a{T)S  +  Xp(T), 

where  S  is  reported  years  of  schooling  and  X  is  a  vector  of  covariates,  using  quarter  of  birth 
as  an  instrument  for  education.10 

We  focus  on  the  specification  used  in  Angrist  and  Krueger  (1991)  which  includes  state 
of  birth  effects,  year  of  birth  effects,  and  a  constant  in  the  covariate  vector.11  The  sample 
we  consider  consists  of  329,509  males  from  the  1980  U.S.  Census  who  were  born  between 
1930  and  1939  and  have  data  on  weekly  wages,  years  of  completed  education,  state  of  birth, 


Specifically,  we  use  the  linear  projection  of  S  onto  the  covariates  X  and  three  dummies  for  first  through 
third  quarter  of  birth,  with  fourth  quarter  as  the  excluded  category,  as  the  instrumental  variable. 

Note  that  the  estimates  of  the  schooling  coefficient  are  not  sensitive  to  the  specification  of  the  X  vector. 
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year  of  birth,  and  quarter  of  birth.  The  sample  was  selected  based  on  the  criteria  found  in 
Appendix  1  of  Angrist  and  Krueger  (1991). 

IV-QR  and  QR  estimates  of  the  schooling  coefficient  are  provided  in  Figure  2.  The  shaded 
region  in  each  panel  represents  the  95%  confidence  interval.  Both  the  quantile  and  inverse 
quantile  regression  estimates  suggest  that  the  "returns  to  schooling"  vary  over  the  earnings 
distribution.  The  first  row  in  Table  2  reports  the  results  from  a  test  of  the  hypothesis  of  a 
constant  treatment  effect  for  the  IV-QR  estimates  which  is  strongly  rejected.  The  variability 
of  the  treatment  effects  is  most  apparent  in  the  IV-QR  estimates.  While  the  QR  estimates 
do  vary  statistically,12  they  are  all  closely  clustered  around  the  OLS  estimate.  The  practical 
lack  of  variability  in  the  QR  estimates  is  clearly  demonstrated  in  the  first  panel  of  Figure 
2,  which  plots  both  the  IV-QR  (solid  line)  and  QR  (dashed  line)  estimates.  Relative  to  the 
IV-QR  estimates,  the  QR  estimates  appear  to  be  approximately  constant. 

The  shapes  of  the  estimated  treatment  effects  are  also  interesting.  The  QR  estimates 
exhibit  a  distinct  u-shape,  implying  higher  returns  to  schooling  to  those  in  the  tails  of  the 
distribution  than  to  those  in  the  middle.  However,  if  schooling  is  endogenous  to  the  earnings 
equation,  these  estimates  do  not  consistently  estimate  the  QTE  and  have  no  causal  interpre- 
tation. IV-QR  estimate,  on  the  other  hand,  are  consistent  for  the  QTE  under  endogeneity 
and  show  quite  different  results  than  those  obtained  through  standard  QR.  In  particular, 
the  IV-QR  results  show  returns  to  schooling  of  approximately  20%  per  year  of  additional 
schooling  at  low  quantiles  in  the  earnings  distribution  which  decrease  as  the  quantile  index 
increases  toward  the  middle  of  the  distribution  and  then  remain  approximately  constant  at 
levels  near  the  QR  and  OLS  estimates.  This  implies  that  the  largest  gains  to  additional 
years  of  schooling  accrue  to  those  at  the  low  end  of  the  earnings  distribution.  This  observa- 
tion is  consistent  with  the  notion  that  people  with  high  unobserved  "ability" ,  measured  as 
the  quantile  index  r,  will  generate  high  earnings  regardless  of  their  education  level,  while 
those  with  lower  "ability"  gain  more  from  the  training  provided  by  formal  education.13 

The  third  row  of  Table  2  reports  the  results  from  the  test  of  stochastic  dominance.  As 
would  be  expected,  it  fails  to  reject  the  null  hypothesis  of  stochastic  dominance,  confirming 
our  intuition  that  schooling  weakly  increases  earnings  across  the  distribution. 

In  the  final  row  of  Table  2,  we  test  the  endogeneity  hypothesis.  The  test  strongly  rejects 
the  null  hypothesis  of  no  endogeneity,  confirming  the  need  to  instrument  for  schooling  in 
the  earnings  equation.  Note  that  this  result  contrasts  the  conclusion  which  could  be  drawn 
from  a  Hausman  test  based  on  the  2SLS  and  OLS  estimates,  which  fails  to  reject  the  null  at 
the  5%  level.14  Again,  this  confirms  our  intuition  that  endogeneity  contaminates  standard 


12The  test  for  the  quantile  regression  estimates  is  not  reported,  but  also  strongly  rejects  the  null  hypothesis 
of  a  constant  treatment  effect. 

The  term  "ability"  is  used  to  characterize  the  unobserved  component  of  earnings,  which  likely  captures 
elements  of  ability  and  motivation  as  well  as  noise. 

The  test  statistic  is  3.64  and  is  distributed  as  a  xi- 
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Table  2.  The  test  results  for  Demand  Equation,  using  b  =  1000  (  subsam- 
pling  with  replacement  ) 


Null  Hypothesis 

Alternative  Hypothesis 

P-value  for  KS  Test  Statistic 

No  Effect.  a()  =  0 

Non-zero  Effect 

<.01 

Constant  Effect,  a(-)  =  a 

Varying  Effect 

.03 

Dominance  a(-)  <  0 

Non-Dominance 

.50 

Exogeneity  a(-)  =  aQR(-) 

Endogeneity 

.04 

estimates  of  the  returns  to  schooling  and  underscores  the  importance  of  accounting  for  this 
endogeneity  in  estimation. 

Overall,  the  results  indicate  that  the  effect  of  schooling  on  earnings  is  quite  heteroge- 
neous, with  the  largest  returns  accruing  to  those  who  fall  in  the  lower  tail  of  the  earnings 
distribution.  The  example  also  illustrates  the  variety  of  interesting  distributional  hypothe- 
ses that  can  be  tested  using  the  inference  procedures  presented  in  this  paper.  The  results 
demonstrate  that  estimates  of  treatment  effects  which  focus  on  a  single  feature  of  the  out- 
come distribution  may  fail  to  capture  the  full  impact  of  the  treatment  and  that  examining 
additional  features  may  enhance  our  understanding  of  the  economic  relationships  involved. 

6.  Conclusion 

In  this  paper,  we  described  how  instrumental  variable  quantile  regression  can  be  used  to 
evaluate  the  impact  of  treatment  on  the  entire  distribution  of  outcomes  when  the  treatment 
is  self-selected  or  selected  in  relation  to  potential  outcomes.  We  introduced  an  instrumental 
variable  quantile  regression  process  and  the  set  of  inferences  derived  from  it,  focusing  on 
tests  of  distributional  equality,  non-constant  treatment  effects,  conditional  dominance,  and 
exogeneity.  Inference,  which  is  subject  to  the  Durbin  problem,  was  handled  via  a  method 
of  score  resampling.  In  the  paper,  we  demonstrated  that  the  method  is  simple  and  compu- 
tationally convenient  and  produces  valid  inference.  The  approach  was  illustrated  through 
two  examples:  estimation  of  the  demand  curve  in  a  supply-demand  system  and  estimation 
of  the  returns  to  schooling.  In  both  cases,  the  hypotheses  of  a  constant  treatment  effect 
and  exogeneity  were  rejected.  The  results  suggest  that  estimates  of  treatment  effects  that 
focus  on  a  single  feature  of  the  outcome  distribution  may  fail  to  capture  the  full  impact  of 
the  treatment  and  serve  to  illustrate  the  variety  of  distributional  hypotheses  that  can  be 
tested  based  on  the  quantile  regression  process. 
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Appendix  A.  Proofs 
We  use  the  following  empirical  processes  in  the  sequel,  for  W  =  (Y,  D,  X,  Z) 

f  H+  En  f(W)  =~J2  f(Wi)i      f  *  Gnf(W)  =  "i  £  (/W)  -  Ef<Wi))  . 

For  example,  if  /  is  estimated  function,  Gnf(W)  means:  -4=  I3"=i  (/Wi)  ~~  ^/(Wi))/=/-  Outer 
and  inner  probabilities,  P*  and  P„  are  defined  as  in  van  der  Vaart  (1998).  In  this  paper  -^->  means 
convergence  in  (outer)  probability,  and  — >  means  convergence  in  distribution.  Wp  — >  1  means 
"with  (inner)  probablity  going  to  1."  We  will  say  that  process  {/  t->  v„(l),l  €  £}  is  stochastically 
equi- continuous  (s.e.)  in  £°°(L)  if  for  each  e  >  0  and  77  >  0,  there  is  6  >  0  : 

limsup   P*(    sup     |u„(0  —  vn(l')\  >  rj)  <  e 

n-Hx>  p(l,l')<8 

for  some  pseudo-metric  p  on  £.,  such  that  (£,/?)  is  totally  bounded. 

A.l.  Proof  of  Proposition  1.  15  Conditioning  on  X  —  x  is  suppressed.  For  P-a.e.  value  z  of  Z 

P[Y<q[D,T]\Z  =  z] 

^P[q[D,UD]<q{D,T]\Z  =  z] 

=  P[Ud<t\Z  =  z], 


f  P[UD  <t\Z  =  z,V  =  v]dP[V  =  v\Z  =  z] 

(=}   IP  [US{ZyV)  <t\Z  =  z,V=  v]  dP[V  =  v\Z  =  z] 

I  P[U0<  t\Z  =  z,  V  =  v]  dP  [V  =  v\Z  =  z] 


(3) 
(4) 
(5) 


(A.l) 


=  P[U0<t\Z  =  z] 

(7) 
=   T. 

Equality  (1)  is  by  Al  and  A5.  Equality  (3)  is  by  definition.  Equality  (4)  is  by  A2.  Equality  (5)  is 
by  the  similarity  assumption  A4:  for  each  d,  conditional  on  (V  =  v,X  =  x,Z  =  z) 

U&{z,v)  equals  in  distribution  to  U0. 

Equality  (6)  is  by  definition  and  equality  (7)  is  by  A3.  Note  that  equality  (2)  is  immediate  when 
t  i->  q{d,r)  is  continuous,  since  we  assumed  that  r  i->  q(d,r)  is  strictly  increasing.  To  show 
(2)  holds  more  generally,  simply  note  that  for  r  €  (0,1)  the  event  {UD  <  t}  implies  the  event 
{q[D,UD]  <  q[D,r]}  by  r  1-4  q[d, r]  non-decreasing  on  (0,1)  for  each  d.  On  the  other  hand,  the 
event  {q[D,UD]  <  q[D,r]}  implies  the  event  {UD  <  r},  since  r  i-4  q[d, r]  is  strictly-increasing  and 
left-continuous16  in  (0, 1)  for  each  d. 


15The  proof  is  that  given  in  Chernozhukov  and  Hansen  (2001)  and  is  given  here  for  completeness. 
16t  1-4  q[d, t]  is  said  to  be  left-continuous  if  limr/1-T  q[d, r']  =  q  [d,r]. 
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Finally,  since  r  >->  q[d,  r]  is  strictly  increasing,  left-continuous,  we  have 

P[q[D,UD}=q[D,T}\Z=.z}  =  0, 

so  that  P-a.e. 

P[Y  <q[D,r}\Z}=P[Y  <q[D,r}\Z}.      ■ 

A.2.  Proof  of  Theorem  1.  In  the  proof  W  denotes  (Y,D,X,Z).    Define  for  d  =   (,0,7)  and 
tf0  =  (/3(r),0)  and  <pr(u)  =  (l(u  <  0)  -  r) 

f(W,a,ti,T)  =  <pT(Y  -  £>'a  -  A"/?  -  4(r)'7)$(r), 
/(W,a,tf,r)  =  ¥>T(F  -  D'a  -  X'0  -  $(t)'7)*(t), 

where  *(t)  =  V(t)  ■  (A",  *')',*(t)  =  $(r,X,Z),  §(r)  =  V(t)  •  (A",#(t))',$(t)  =  $(X,Z); 
g{W,a,0, r)  H  pT(Y  -  D'a  -  X'0  -  4.(r)'7)t>(r), 
g(W,  a,  d,  t)  =pT(Y-  D'a  -  X'0  -  *(t)'7) V(r), 
where  pr(w)  =  (t  —  l(u  <  0))u.  Let 

Q„(a,tf,T)  =E„$0V,  a,  0,t),    Q(M,t)  =^5(W,a,i?,r), 
and 

#(a,r)  =  (/3(a,r),7(a,r))  =     arg    inf    Q„(a,i?), 

#(a,r)  =  (/J(a,T),7(a,T))  =    arg    inf    O  (a,$,r), 

i?e3xS 

«(t)   =  arg  inf  ||7(a,T)||,     a*  =  arg  inf  ||7(a,r)||. 

Step   1   (Identification)  Here  we  show  that  9(t)  =   (a(T)' ,  0(t)')  uniquely  solves  the  limit 
optimization  problem.  Define 

11(9, t)  =  EP  [pT(Y  -  D'a  -  X'0)9] ,    9t=Vt-  [X't  :  *J]' 
W r)  =  Q^T^Ep  l<Pr(Y  -  D'a  -  X'flV] , 

We  know  by  Proposition  1  that  6(t)  =  (a(r)' ,0(t)')  solves 

II(0(t),t)=O. 
Suppose  there  exits  another  0  in  the  parameter  set  that  solves  this  equation 

n(0*(r),r)=O. 
Then  for  any  comformable  non-zero  vector  A  Taylor  expansion  gives 

\'(TI(9*(t),t)-I1(9,t))=\'J(9X(t),t)\*=0 

for  0\(t)  is  on  the  line  segment  that  connects  9*{t)  and  9(t),  and 

A*  =  (0*(r)-0(r)) 
which  yields  a  contradiction  by  the  full  rank  assumption  after  setting  A  =  A*. 
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So  we  have  that,  the  true  parameters  (o:(t),/3(t))  solve  the  equation 

Etpr(Y  -  D'a{r)  -  X'0(r)  -  $(r)'0)*(r)  =  0. 

On  the  other  hand,  by  R3  and  by  convexity  in  #  of  the  limit  optimization  problem  for  each  r  and 
a,  {>(a,T)  uniquely  solves  the  equation: 

EipT(Y  -  D'a  -  X'P(a,T)  -  $(T)'7(a,r))$(r)  =  0. 

We  need  to  find  a*  (r)  such  that  this  equation  holds  and  the  norm  of  7  (a,  t)  is  as  small  as  possible. 
a*  =  q(t)  makes  the  norm  of  7(0:*, r)  =  0  equal  zero  by  Proposition  1.  Thus  a*(r)  =  a(r)  is  a 
solution;  by  the  preceding  argument  it  is  unique  and  (3(o*(t),t)  —  @{t). 

Step  2  (Consistency)  By  Lemma  2 


p, 


sup  <2„(a,tf,r)-Q(a,t?,T)    -^0 

(a,tf,r)6.Ax(2xS)xT11  " 


This  implies  by  Lemma  1  for  extremum  processes: 


which  in  turn  implies 


sup       \\'d(a,T)-'d(a,T)\\  — >  0,  (a.2) 

(a,T)6^xTM  " 


sup 

{q,t)€Ax7 

which  by  invoking  Lemma  1  again  implies 


ll7(a,r)|U-||7(o,r)|U 


Ao, 


sup    d(r)  —  q(t)     AO, 

tST1' 


which  by  equation  (A.2)  implies 


sup 

T6i 


0(t)-/3(t)     ^->0,     sup    7(6(t),t)- 0-^0, 

11  reT  11 


Step  3  (Asymptotics)   By  the  computational  properties  of  quantile  regression  estimator  #(an), 
for  any  an(r)  in  a  small  ball  at  aT(r) 

0(K/V^)  =  VnKf(W,an(T)J(an(T),T)),T).  (A.3) 

By  lemma  2,  the  following  expansion  of  r.h.s.  is  valid  for  any  an(T)  —  a(r)  — >  0  uniformly  in  r:17 

v^En/(W,an(r)^(an(r)))=Gn/(W,an(r),^(an(r),r)>r) 

+  VKEf(W,an(T),Vn(an(T),T),T) 

=  G„/(W,a(T),1?(Q(T),T),T)  +op(l) 

+  y/EEf{W,an(T)J{an(T),T),r) 

Expanding  the  last  line  further,  uniformly  in  r 

0{K/y/K)  =  G„/(W,a(T),l?(T))  +Op(l) 

+  (Mr)  +  oP(l))v^(^(an(r),  r)  -  0(r))  (A.5) 

+  (Ja(r)  +  op(l))v^(a„(r)  -  o(t)), 


17Note  that  by  convention  in  empirical  process  theory  Ef(W)  means  (Ef(W)) ,_>. 
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where 

Mt)  =  WW)Ep  br(y  "  D'a{T) "  ^ "  *(r)'7)*(T)W)=(o,0W) ' 

In  other  words  for  any  an{r)  -  a(r)  -^>  0  uniformly  in  r 

V^(^(an(r),r)  -  tf(r))  =  -  ^WGJ^a^W) 

-  J71  (t)  J„(t)[1  +  0p(l)]v/»T(au(T)  -  a(r))  +  pp(l),  i.e 
VrT(7(a„(r),r)  -0)  =-  J7(r)G„/(W,aT,0r) 

-  J7(r)  Ja(r)[l  +  op(l)]VrT(an(r)  -  a(r))  +  op(l), 
where 

[J/lW ■:  Mt)']'  =  V«- 

Center  a  shrinking  closed  ball  at  a(r)  for  each  r  and  denote  those  balls  Bn(a(r)),   wp  — >  1, 

d(r)  =  arg  inf         Jl7(an(r),r)|U. 

Observe  that  by  Lemma  2 

N/ri||7(a„(r),r)|U  =  110,(1)  -  Jt(t)J«(t)[1  +  op(l)]v^Mr)  -  a(r))||A, 
Since  J7(7-)Ja(T)  and  ^4  have  full  rank,  ^/n{a{r)  -  a(r)  =  0P(1).  Hence  by  lemma  1 

yfc{&{T)  -  o(t))  =  arginf  ||  -  J7(r)G„/(W,a(r),0(r))  -  J7(t)Jq(t)mIL. 

where  =  means  that  the  plims  of  the  lhs  and  rhs  agree  in  £°°(7).  Conclude  that: 
y/H(a(T)-a{T))  =  -(MtYMtYAJ^Mt))'1 

x  (ja(r)'J7(r)'^J7(T))G„/(W,a(T),0(r))  =  0,(1) 
and 
i/n#(fi(T),T)  -*(a(T),T))  =  -J^1^)]/- Ja(r)(ja(r)'J7(r)'J4J7WJa(r))"1Ja(r)'J7(r)'ylJ7(r)] 

xG„/(W,a(r),0(r))-Op(l) 
Now  note  that  due  to  simplifications  because  of  invertibility  of  JQ  J7  we  have 

^(7(a(r),r)-7(a(r))r))^[-J7(r)[/-JQ(r)(jQ(r)'J7(r)')~1J7(r)]Gn/(W,a(r),0(r)) 

=  0  x  Op(l) 

Instead  of  working  out  the  algebra  to  see  a  drastic  simplification,  using  this  fact  and  putting 
(a„(T),i?(a„(T),T))  =  (&(t),0(q(t),t))  =  (a(r),/3(r),0  + op(l/-v/n)j  back  into  the  expansion 
(A. 5)  we  have  uniformly  in  t 

Gnf(W,a(T),0(r))  =  J(r)V^  (  f^lf^]   J  +  Ml) 
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Next  by  Lemma  2 

Gnf(W,a(T),ti(T))  =>  G(r)  in  £°°(T) 

where  G(r)  is  the  Gaussian  process  with  covariance  function 

S(t,t')  =  (min(r,r')  -  tt')£$(t)$(t)'. 
which  yields  the  desired  conclusion 


*{%-.%)*  war***. 


A.3.  Proof  of  Theorem  2.  Since  /  is  either  Kolmogorov-Smirnov  or  Cramer-von-Misses  or  one- 
sided version  of  these  statistics,  Part  1  follows  by  continuous  mapping  Theorem.  Part  2  follows  by 
observing  that 

/(V»ff(-))  -^  oo  =»   /(v^5(-)  +  (?»(■))  ^  oo, 

for  any  tight  element  G„(-)  =  Op(l)  in  £°°(T)  such  /,  once  the  null  is  violated  (once  the  composite 
null  is  violated  for  one-sided  tests).  We  also  need  that  the  distribution  function  of  these  limiting 
statsistics  is  continuous.  This  follows  by  Theorem  11.1  in  Davydov,  Lifshits,  and  Smorodina  (1998): 
the  distribution  of  functional  f(vo(-)  +p(-))  where  /  is  of  the  specified  sort,  is  absolutely  continuous 
at  x  >  0  once  Vo(-)  has  a  nondegenerate  covariance  kernel. 

That  /?  >  a  follows  from  a  generalized  Andersen's  Lemma  for  general  Banach  spaces,  Lemma 
3.11.4  in  van  der  Vaart  and  Wellner  (1996).  ■ 

A.4.  Proof  of  Proposition  2.  Immediate  from  assumptions.  ■ 

A. 5.  Proof  of  Theorem  3.  To  simplify  the  presentation,  we  assume  that  A(r),  J(t),H(t)  are 
known.  However,  the  case  with  the  estimated  matrices  is  straightforward  by  e.g.  using  the  arguments 
in  the  proof  Propostion  1  in  Chernozhukov  (2002)  in  part  II  of  this  proof. 

We  begin  by  showing  Part  1  using  the  following  steps. 

Step  I.  By  assumption  realizations  of  function 

T  H-»  z(W,r) 

belongs  to  a  Donsker  set  of  functions.  We  will  denote  this  set  as 

{£(W,T),reT,feE} 

Consider  the  empirical  process 

(T,£WGn(£(r)), 

which  is  Donsker  by  assumption  with  limit  law  denoted  J{P).  Consider  also  its  subsample  realiza- 
tions 

(r,0  ^  GilM(£(r))  =  -^^  K(W4,t)  -  EZ{Wut))  ,     j  =  l,...,Bn. 
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Let  Jn(Pn)  denote  the  sampling  distribution  of  (t,£)  h*  Gn(£(r)),  and  let  Lb<n  denote  the  sub- 
sampling  distribution  of  (r,£)  i->  Gj,j>,n  (£(f))-  By  Theorem  7.4.1  in  Politis,  Romano,  and  Wolf 
(1999) 

PL  (Jn(Pn),Lb>n)  -^  0  and  pl  (J(P),LKn)  -A  0,  (A.6) 

where  pi,  denotes  the  Bounded-Lipschitz  metric  (Levy  metric)  that  metrizes  weak  convergence. 

Next  let 

•  Jn{Pn,0  denote  the  sampling  (outer)  law  of  r  h-»  [Gn(£(r))], 

•  Lby„(£,)  denote  the  subsampling  (outer)  law  of  r  i->  [Gj,&,n(£(r))]i 

•  J{P,Q  denote  the  limit  law  of  r  i-»  [G„(f  (r))], 

By  (A.6)  we  have  by  definition  of  pl 

sup[pL  (Jn(Pn,0,Lb,n(0)}  -A  0  and  sup[pL  (J(P,€),L»,„(0)]  -^  0, 

since  the  projection  maps  t  i->  G„(£(t))  are  bounded  uniform  in  £  Lipschitz  functionals  of  (£,  r)  ■-* 
Gn(£(r)).  This  means  that 

[PL  (Jn{Pn,z),Lb,n{z))]  -A  0  and  [pL  (J(P,z),Lb<n(z))}  -A  0,  (A.7) 

provided 

Jn(Pn,i)->  J{P,Z). 

The  last  observtaion  follows  from  the  assumed  in  1.4  (a)  Donskerness,  assumed  orthogonality  1.4  (b), 
and  continuity  with  respect  to  the  Li(P)  semi-metric  by  1.4(a): 

sup|£|k(£(r))-Gn(z(T))[2|        4  SUp  |Var[£(Wi5r)  -*(Wi,T)]  I        -^  0, 

T        I  II  II       l£=l  T        I  l£=Z 

For  /  denoting  the  two-  and  one-sided  KS  and  CM  functionals  on  the  empirical  processes,  let 

•  Hn  denote  the  (outer)  distribution  function  of  /[G„(z(r))], 

•  Hb,n  denote  the  subsampling  distribution  function  of  /  [Gj-i(,i7j(z(t))], 

•  r  denote  the  distribution  function  of  /  [Goo(^(-))]i 

By  (A.7)  and  definition  of  pi, 

PL  {Hn,Hb,n)  -2+  0  and  pL  (Hn,H)  -^  0, 

since  the  functionals  /  (G„  (£(•)))  are  bounded  uniform  Lipschitz  functionals  of  r  t->  Gn  (£(•))- 

Now  we  need  to  convert  this  result  into  convergence  of  distribution  functions  at  continuity  points. 
r  is  absolutely  continuous  (  at  x  >  0  for  one  sided  statistics)  as  shown  in  the  Proof  of  Theorem 
2.  Since  the  statistics  are  real-valued,  convergence  with  respect  to  Levy  metric  is  equivalent  to 
pointwise  convergence  at  the  continuity  points  x  of  T(x) 

Hb,n(x)^T(x). 
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Step  II.  Now  note  that  we  actually  have  the  subsampling  distribution  Tb,n  of  /  [i^,&,7i(-)]  and  n°t 
Hbtn,  but  difference  between  T^n  and  Ht,tn  will  be  shown  small.  Indeed, 

/  [%>,„(-)]  -  Kn  <  f  KM(-)]  <  /  [Gj,b,n(-)}  +  Kn, 
where  e.g.  when  /  is  KS  function: 

K„  =  sup  \\Vb-  [Ez{W,T)}z=i  II         <  Vb-  sup  |c||tf(r)  -  J?(r)||  +  <7||?(t)  -  r(r)||| 

V  V"  / 

by  invoking  the  condition  1.4(b)- (c)  and  then  1.3. 

Thus  wp  ->•  1  l(En)  =  1,  where  £„  =  {#„  <  5}  for  any  6  >  0. 

Given  the  event  £„  for  a  small  e  >  0  there  is  S  >  0,  Ht,tn{x  —  e)l(En)  <  Ti)!Tl(x)l(En)  < 
Hb,n(x  +  e)l(En)  so  that  with  probability  tending  to  one: 

tffc.n (x  -  e)  <  Fb,nix)  <  #ti,6(x  +  e). 

We  have  by  Step  I  Hn^{x  +  c)  — >  T(x  —  c),  for  c  =  e  and  c  =  — e,  which  implies 

r(x  -  e)  -  e  <  Tbt7l(x)  <  T{x  +  e)  +  e 

w.p.  — >  1.  since  e  can  be  set  as  small  as  we  like  and  T  is  continuous  at  points  of  interest.  This  yields 
the  conclusion  Ti,tn(x)  — — >  T(x). 

Step  III.  Finally,  convergence  of  quantiles  is  implied  by  the  convergence  of  distribution  functions 
at  continuity  points.  E.g.  Politis,  Romano,  and  Wolf  (1999). 

Part  2  of  Theorem  3  is  immediate  from  Part  1  and  contiguity. 

Part  3  of  Theorem  3  follows  by  steps  that  are  identical  to  those  in  the  proof  of  Part  1,  except 
that  we  have  convergence  of  subsampling  distribution  Tt,,n  to  some  other  distribution  T'  ^  T  at  the 
continuity  points.  By  tightness  of  T',  c„(l  —  a)  =  Op(l)  even  if  T'  is  not  continuous  at  T'-1  (1  —  a). 
Indeed,  we  have  cn(l  —  or7)  <  c„(l  —  a)  <  c„(l  —  a"),  where  a'  and  a"  are  picked  such  that  T'  is 
continuous  at  some  finite  r/_1(l  —  a')  and  r'_1(l  —  a")  (possible  by  tightness).  We  then  have  by 
steps  like  in  the  proof  of  Part  1,  cn(l  -  a1)  -^  Y'~l  (1  -  a')  and  cn(l  -  a")  -^  T'~l  (1  -  a"). 

Part  4  has  already  been  proved  in  the  proof  of  Theorem  2.  ■ 

Appendix  B.  Lemmas 

Lemma  1  (Argmax  Process).  Suppose  that  uniformly  in  rr  in  a  compact  set  H  and  for  a  compact 
setK 

i.  Zn{ir)  is  s.t.  Qn(Zn\n)  >  supz£K  Qn(z\n)  -  en,  en  \  0;  Zn{n)  =  Op(l)  in  e°°(H). 
ii.  Zoo{t!)  =  argsupz6A-  Qoo{z\-n)  is  uniquely  defined  continuous  process. 
iii.  Q„(\-)  — >  Qoo(-|-)  in£°°(K  x  U),  where  QooMO  is  continuous.   Then 
Zn(-)  -^  Z«,(-)  in  e°°{U). 
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Proof.   The  result  is  well  known  for  the  non-process  case.   The  argument  is  just  slightly  more 
complicated  than  usual  consistency  arguments,  cf.  Amemiya  (1985)  . 

Suppose  first  that  convergence  in  iii.  holds  uniformly  in  probability. 

We  have 

Q„(z|jr)-^Q„(z|7r)  (B.l) 

where  the  convergence  is  uniform  in  (z, tt)  over  compact  sets.  Uniformly  in  e  €  {c,d),  d  >  c  >  O'wp 
— >  1,  and  uniformly  in  7r:  [i]  Qn{Zn{^)\^)  >  Qn{ZooW\n)  —  e/3  by  definition,  [ii]  Q^Z n(n)\ir)  > 
Qn(Zn(n)\n)  -  e/3  by  (B.l),  [iii]  Qn(Zco{ir)\ir)  >  Q„(Zoo(jr)|7r)  -e/3  by  (B.l).  Hence  wp  ->  1 

Q„{Z„(n)\n)  >  Q„(Zn(7r)|7r)  -  e/3  >  Q„(Z0O(7r)|7r)  -  2e/3  >  Q^Z^Mln)  -  e. 

Pick  any  S  >  0.  Let  {B(7r),7r  6  II}  be  a  collection  of  balls  with  diameter  S  >  0,  each  centered  at 
Zoo(7r).  Then  c  =  infffGn  [Qoo(Zoo(t)K)  -  supzeK^B^  Qco{z)]  >  0  by  assumption  ii,  and  for  any 
e  >  0  we  can  pick  c  and  d  so  that  for 

P,(ee  (c,c'))>l-e. 

It  now  follows  wp  becoming  greater  than  1  —  e,  uniformly  in  7r 

Qoo(Zn(n)\n)  >  Qoo(^cx)(7r)|7r)  -  Qoo(^oo(w)|7r)  +      sup      Qcx>(^|7r)  =      sup      Qoo(z). 

z£K\B(ir)  z£K\B{k) 

Thus  wp  becoming  greater  than  1  —  e, 

sup  ||-Zn.(7r)  -  Zoo(7r)||  <  S. 
7ren 

But  e  is  arbitrary,  so  the  preceding  display  occurs  with  probability  converging  to  one.  ■ 

Lemma  2  (Stochastic  Expansions).    Under  assumption  R1-R4, 

I.   Uniformly  in  (q,/3,  j,t)  in  (i  x  S  x  9  x  T) 

E„[$(W,a,/?,7,T)]^£[ff(W,a,/?,7,T)]. 

II.   Uniformly  in  r  in  7 

Gnf(W,a(T)J(T),j(T),T)  =  Gnf(W,a(T),p(T),Q,T)+op(l), 
for  any  (a(r),/?(r), 7(7-))  — »  (a(r),/3(r),0)  uniformly  in7.  Furthermore 

Gnf(W,a(T),p(r),0,T)  =>  G(t)  in  £°°(T), 
where  G  is  a  Gaussian  process  with  covariance  function  S(t,t')  defined  in  Theorem  1. 

Proof.  We  first  show  II.  Denote  7r  =  (a,  /?,  7)  and  II  =  A  x  "B  x  S  where  S  is  a  closed  ball  at  0. 
We  first  show  that  the  class  of  functions 

0i=  {n=  {*,¥,ir,T)t-Kf>T(y-D'a-X'0-*(X,Z)''Y)*[X,Z),    tt  en,$  gj,$6?} 

is  Donsker,  where  7  is  defined  in  R4. 
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The  bracketing  number  of  7  by  Cor  2.7.4  in  van  der  Vaart  and  Wellner  (1996)  staisfies 


dim''''A  '-2+5' 


\ogNu{e,7,L2{P))  =  0\\  )=0\\ 


e 


for  some  5'  <  0.  Thus  7  is  Donsker  with  a  constant  envelope.  By  Cor  2.7.4  in  van  der  Vaart  and 
Wellner  (1996)  the  log  of  bracketing  number  of 

X=  {(*,7t)k>  {D'a-X'0-${X,Z)''y),    tt  €!!,$€?} 

satisfies 


/l^^\  A2+'" 

log  N[.](e,X,L2(P))  =  oi-  )=0[-e 

for  some  6"  <  0.  Exploiting  the  monotonicity  and  boundedness  of  indicator  function  and  assumption 
R3  the  log  of  bracketing  number  of 

V=  {($,*).-►  1(Y  <D'a  +  X'P  +  $(X,Zy1),    jr€n,§€?} 

satisfies 

log  N[.](e,V,L2(P))  =  0i- 

as  well.  Therefore  V  is  Donsker  since  it  has  a  constant  envelope  by  Rl  and  R4. 
Class  !H  is  formed  by  taking  products  and  sums  of  Donsker  classes 

3,V,  and  T={t4t}  : 

H  =  7-3r-V-7, 

which  is  uniformly  Lipshitz  over  (7  x  7  x  V),  and  by  Theorem  2.10.6  in  van  der  Vaart  and  Wellner 
(1996)  "X  is  Donsker. 

Now  we  show  (II)  using  the  established  Donskerness.  Define  the  process 

h  =  (*,*,'ir,T)t-tGnipT(Y-D'a-X'P-$(X,Z)''y)y{X,Z). 

This  process  is  Donsker  (asymptotically  Gaussian).  Therefore  the  process 

T^GnipT(Y-  D'ol{t)  -  X'P(t))V{t, X, Z). 
is  also  Dosnker  by  linearity  in  t  and  by  the  uniform  Holder  property  of 

T^{a(T)',P(T)'MT,X,Z),*{T,X,Z)) 

in  t  with  respect  to  the  supremum  norm,  by  R3  and  R4.  (  To  check  the  Donskerness,  it  is  easy  to 
verify  (i)  the  definition  of  stochastic  equicontinuity  in  r  with  respect  to  the  L2{P)  semi-metric  and 
(ii)  finite-dimensional  asymptotic  normality  by  Linderberg-Levy  CLT.  )  Thus  we  have 

Gn<pT(Y  - D'air)  -  X'P(t))^!(t,X,Z)  ^  G(t), 

where  G(r)  has  covariance  function  S(t,t'). 

Since  $(•)  — >  $(•)  and  $(•)  — >  $(•)  uniformly  over  compacts  and  tt(t)  —  7t(t)  — >  0  uniformly 
in  r,  we  have  by  R3  and  R4: 

6n  =  sup  p{h(T),h{r))  -A  0, 
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where  p  is  the  L2(P)  semimetric  on  JC: 

p(h,  h)  =  Var  [<pT(Y  -  D'a  -  X'0  -  $(X,  Z)'7)$(X,  Z) 

-<pf(Y-  D'a  -  X'P  -  $(X,  Z)'7)§(X,  Z)]  , 
so  that 

sup  \\Gn<pT(Y  -  D'a{r)  -  X'$(t)  -  $(r,  X,  Z)'7(t))$  (r,X,Z) 


-G„ipr(Y  -  D'a{T)  -  X'0(t))*{t,X,Z)J 


Ao 


<      sup     \\Gn<Pf(y-Ifa-X'P-*{X,Z)'i)ii(X,Z) 

p(h,h)<6n" 

-  Gn<pT(Y  -  D'a  -  X'P  -  *{X,  ZYi)9(X, Z)\  -*+  0 

as  Sn  ->  0  by  stochastic  equicontinuity  of  h  h->  G„ysT(Y  -  -D'a  -  X' 0  -  $(X,  Z)'7)#(X,  Z)  (which 
is  a  part  of  being  Donsker). 

Having  shown  II,  a  simple  way  to  show  I.  is  to  note  that  functions 

9  =  {(*,V,a,0,7,T)  ^  pT(Y  -  D'a  -  X'/3  -  *(X,Z)'j)V(X,Z)}} 

are  uniformly  Lipschitz  over 

(JxJx^lxSxgxT) 

which  by  Theorem  2.10.6  in  van  der  Vaart  and  Wellner  (1996)  and  assumption  Rl  means  that  9  is 
Donsker.  From  this  we  have  a  uniform  LLN 

sup  ||Enpr(y  -  D'a  -  X'0  -  $(X,Z)'j)V(X,Z) 

-  EpT (Y  -  D'a  -  X'p  -  $(X,  Z)'-y)V{X,  Z) 
and  by  uniform  consistency  of  3>  and  V  and  R4  we  have 

sup  \\EpT{Y-D'a-X'l3-${T,X,zy-y)V{T,X,Z)\. 

(o,)9,7,T)e(4x3xSxT}        "  l<J>=<J>,V=V 

-  EpT{Y  -  D'a-  X'0  -${T,X,Z)'f)V{T,X,Z)\\  -^  0.    ■ 

Appendix  C.  Verification  of  Linear  Representations 

Proposition  3.  The  conditions  1.3  and  1. 4  are  verified  for  the  proposed  implementation  in  Examples 
1-4  under  conditions  R.1-R4- 

Proof.     In  Example  1,  in  the  test  of  equality  of  distributions,  1.3  is  satisfied  for  #(•)  by  Theorem 
1  by  contiguity  of  Pn  relative  to  P  under  which  Theorem  1  was  proven.  Since  r  =  0, 

Z4(T)=fl(T)[j(T)-1/i(T,^(T))*i(T)], 

where 

h(T,9(T))  =  (t  -  l(Yt  <  DMt)  +  X'tP(T)))  ,%(r)  =  V^^dr)' ,X'}'  (C.l) 
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Condition  1.4(a)  is  checked  in  the  proof  of  Lemma  2  in  Appendix  B,  cf.  the  class  of  functions  "K. 
Condition  1.4. (b)  holds  by  Proposition  1  and  since  $,-  is  a  function  of  (Xi,  Z;)  only.  Condition  1.4(c) 
holds  by  the  bounded  density  condition  R3. 

In  Example  2,  in  the  test  of  constant  effect,  r(-)  =  Q{\)  is  an  IV-QR  estimate.  Thus  for  /*(•) 
defined  in  (C.l) 

zt(r)  =  R(r)  [JM-^MMJiMt)  -  Jii^kdAi^id))]  . 
i.e.  di(r,r(T))T,-(r)  =  k{h,0{i))^i(l)-  Thus  I.3-I.4  hold  by  the  preceding  argument. 

In  Example  3,  the  test  of  Stochastic  Dominance,  r  is  also  known,  so  the  situation  is  like  that  in 
Example  1,  for  lt  defined  in  (C.l) 

Zi(T)=R(T)[j(T)-1li(T,0(T))Vi(T)], 

so  1.3  and  1.4  are  verifed. 

In  Example  4,  in  the  test  of  no-endogeneity,  the  estimate  of  r(r)  is  given  by  the  ordinary  QR  of 
Y  on  D,X,  denoted  as  i?(r).  In  this  case  under  conditions  R.1-R.4,  the  estimator  $(r)  satisfies  13, 
e.g.  Portnoy  (1991): 

n 

v^  (*(■)  -  *»(-))  =  Hty'n-1'2  J2  *(■.*»(■))  +  oVn  (1), 
di(T,*{T))  =  (r  -  l(Yi  <  X[d{r))Xiz  Xt  =  (Dj.Af)' 

Thus  the  score  is  given  by 

Zi(r)  =  R(t)  [JM-'iitT.OWl^W-ifW-^tr^W]  . 

The  conditions  1.3  and  1.4  for  Z;  (t,  #(t))\I'j  (t)  are  checked  above.  As  for  dt  (r,  tf ),  the  proof  of  Lemma 
2  checks  1.4. (a)  (put  Xi  in  place  of  $*  and  7  =  0).  Note  that  Edi(r, i?(r))  =  0  from  the  definition 
of  the  quantile  regression  coefficient  $(r),  so  1.4. (b)  is  satisfied,  and  1.4. (c)  holds  by  the  bounded 
density  condition  R.3.  ■ 
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FIGURE  1.  The  sample  size  is  111.  Coefficient  estimates  are  on  the  vertical  axis, 
while  the  quantile  index  is  on  the  horizontal  axis.  The  shaded  region  is  the  90% 
confidence  band  estimated  using  robust  standard  errors.  The  first  panel  contains 
estimates  of  the  price  elasticity  of  demand  obtained  through  instrumental  variables 
quantile  regression.  The  second  panel  presents  estimates  of  the  effect  of  ln(P)  on 
ln(Q)  obtained  through  standard  quantile  regression.  Estimates  were  computed  for 
r£  [.05,  .95]. 
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FIGURE  2.  The  sample  size  is  329,509.  Coefficient  estimates  are  on  the  verti- 
cal axis,  while  the  quantile  index  is  on  the  horizontal  axis.  The  shaded  region  is 
the  95%  confidence  band  estimated  using  robust  standard  errors.  The  first  panel 
contains  estimates  of  the  returns  to  schooling  obtained  through  instrumental  vari- 
ables quantile  regression.  The  second  panel  presents  estimates  of  the  effect  of  years 
of  schooling  on  earnings  obtained  through  standard  quantile  regression.  For  com- 
parison, the  dashed  line  in  the  first  panel  plots  the  schooling  coefficient  estimated 
through  standard  quantile  regression.  All  estimates  were  computed  for  r  €  [.05,  .95]. 
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